[Speculative Decoding] fix mtp stop_seqs and limit thinking bugs by lonelygsh · Pull Request #7166 · PaddlePaddle/FastDeploy

lonelygsh · 2026-04-02T12:54:55Z

Motivation

本 PR 修复投机解码中 speculate_set_stop_value_multi_seqs 和 speculate_limit_thinking_content_length 两个 kernel 因 step_idx 语义变更引起的索引错误。

Modifications

speculate_set_stop_value_multi_seqs

修复 can_stop 判断：step_idx_now >= min_token_limit → step_idx_now + accept_num >= min_token_limit，因为 step_idx 不再包含本轮 token。
修复跳过条件：step_idx_now - accept_num + accept_idx + 1 < stop_seq_len → step_idx_now + accept_idx + 1 < stop_seq_len，去除旧语义遗留的 -accept_num 偏移。
修复 accept token 路由条件：stop_seq_len - 1 - i < accept_idx → stop_seq_len - 1 - i <= accept_idx，使 accept_idx 直接对应 stop sequence 结束的 accept token 位置，语义更清晰。
修复 accept_tokens 索引：去除多余的 -1 偏移。
修复 pre_ids_idx 计算：step_idx_now - accept_num + accept_idx - offset → step_idx_now + accept_idx - offset，去除旧语义遗留的 - accept_num 偏移。

speculate_limit_thinking_content_length

修复 current_base_step 计算：step_idx[bid] - original_accept_num + 1 → step_idx[bid] + 1，适配新 step_idx 语义。
去除 step_idx 回退逻辑：截断 accept_num 时不再修改 step_idx。
step_idx 参数改为 const：该 kernel 不再写入 step_idx，去除调用侧 const_cast。

测试

更新 test_speculate_set_stop_value_multi_seqs.py，同步适配新 step_idx 语义下的索引和匹配逻辑。

Usage or Command

无新增接口，修复已有逻辑。可通过投机解码推理验证 stop sequences 截断行为及 thinking 长度限制是否正确。

Accuracy Tests

单元测试通过。

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests. 已更新 test_speculate_set_stop_value_multi_seqs.py。
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-02T12:55:04Z

Thanks for your contribution!

CLAassistant · 2026-04-02T12:55:10Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

guanshihui] seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-04-02T15:22:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@bb1f977). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7166   +/-   ##
==========================================
  Coverage           ?   73.62%           
==========================================
  Files              ?      383           
  Lines              ?    53513           
  Branches           ?     8378           
==========================================
  Hits               ?    39401           
  Misses             ?    11361           
  Partials           ?     2751

Flag	Coverage Δ
GPU	`73.62% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…_stop_value kernels - speculate_limit_thinking_content_length: update current_base_step to step_idx+1 (step_idx now records history count before current round); remove incorrect step_idx decrement on accept_num truncation; mark step_idx param as const. - speculate_set_stop_value_multi_seqs: fix can_stop gate to use step_idx_now+accept_num>=min_token_limit; fix skip check and pre_ids_idx formula (remove stale -accept_num offset); use <= condition so accept_idx maps directly to the accepted token that ends the stop sequence; fix accept_tokens index (remove -1). - Update unit tests for speculate_set_stop_value_multi_seqs kernel.

fastdeploy-bot

🤖 AI Code Review | 2026-04-08

📋 Review 摘要

PR 概述：修复投机解码中两个 kernel 因 step_idx 语义变更引起的索引错误

变更范围：custom_ops/gpu_ops/speculate_decoding/（2 个 CUDA kernel + 1 个测试文件）

影响面 Tag：[Speculative Decoding] [BugFix]

PR 规范检查

PR 规范符合要求：

✅ 标题包含 [Speculative Decoding] Tag
✅ 描述包含 Motivation 和 Modifications
✅ 提供了测试修改说明

问题

未发现阻塞性问题。

总体评价

本 PR 修复了因 step_idx 语义从"包含本轮 token"变更为"不包含本轮 token"后导致的索引计算错误。经过代码分析，两个 kernel 的修复逻辑正确：

speculate_set_stop_value_multi_seqs.cu：
- can_stop 判断：step_idx_now >= min_token_limit → step_idx_now + accept_num >= min_token_limit ✓
- 跳过条件、accept token 路由、索引计算均正确去除了旧语义遗留的 -accept_num 偏移 ✓
- 新增边界保护 accept_idx <= accept_num - 2 防止越界写入 eos ✓
speculate_limit_thinking_content_length.cu：
- current_base_step 计算修复正确 ✓
- 移除了 step_idx 回退逻辑，与只读语义一致 ✓
- 参数改为 const int64_t* 语义正确 ✓
测试覆盖充分：
- 更新了 reference 实现与 CUDA kernel 逻辑一致
- 新增 test_stop_seq_at_last_position_not_detected 验证边界行为
- 所有断言符合新语义下的预期输出

经确认，其他使用 step_idx 的模块（speculate_verify.cu、unified_update_model_status.cu 等）语义一致，无需同步修改。

lonelygsh had a problem deploying to Metax_ci April 2, 2026 12:55 — with GitHub Actions Failure

paddle-bot bot added the contributor External developers label Apr 2, 2026

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from ba88df0 to 0f4325c Compare April 2, 2026 13:37

lonelygsh had a problem deploying to Metax_ci April 2, 2026 13:37 — with GitHub Actions Failure

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from 0f4325c to 41a8185 Compare April 2, 2026 13:40

lonelygsh had a problem deploying to Metax_ci April 2, 2026 13:40 — with GitHub Actions Failure

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from 41a8185 to 8dea198 Compare April 2, 2026 13:42

lonelygsh had a problem deploying to Metax_ci April 2, 2026 13:42 — with GitHub Actions Failure

yuanlehome approved these changes Apr 2, 2026

View reviewed changes

lonelygsh changed the title ~~[Speculative Decoding] fix mtp stop_seqs bugs~~ [Speculative Decoding] fix mtp stop_seqs and limit thinging bugs Apr 3, 2026

lonelygsh changed the title ~~[Speculative Decoding] fix mtp stop_seqs and limit thinging bugs~~ [Speculative Decoding] fix mtp stop_seqs and limit thinking bugs Apr 3, 2026

yuanlehome previously approved these changes Apr 3, 2026

View reviewed changes

lonelygsh closed this Apr 7, 2026

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from 8dea198 to ae2f9f4 Compare April 7, 2026 07:07

lonelygsh had a problem deploying to Metax_ci April 7, 2026 07:07 — with GitHub Actions Failure

lonelygsh reopened this Apr 7, 2026

lonelygsh dismissed yuanlehome’s stale review via 52711f9 April 7, 2026 14:47

lonelygsh had a problem deploying to Metax_ci April 7, 2026 14:47 — with GitHub Actions Error

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from 52711f9 to b37c463 Compare April 7, 2026 15:10

lonelygsh had a problem deploying to Metax_ci April 7, 2026 15:10 — with GitHub Actions Error

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from b37c463 to dd2326a Compare April 7, 2026 15:15

lonelygsh had a problem deploying to Metax_ci April 7, 2026 15:16 — with GitHub Actions Failure

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from dd2326a to 4ab41f1 Compare April 8, 2026 07:20

lonelygsh had a problem deploying to Metax_ci April 8, 2026 07:20 — with GitHub Actions Error

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from 4ab41f1 to a0be6ee Compare April 8, 2026 07:53

lonelygsh had a problem deploying to Metax_ci April 8, 2026 07:53 — with GitHub Actions Error

lonelygsh force-pushed the fix-speculate-decoding-index-bugs branch from a0be6ee to 99b5c45 Compare April 8, 2026 08:15

lonelygsh had a problem deploying to Metax_ci April 8, 2026 08:15 — with GitHub Actions Failure

fastdeploy-bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative Decoding] fix mtp stop_seqs and limit thinking bugs#7166

[Speculative Decoding] fix mtp stop_seqs and limit thinking bugs#7166
lonelygsh wants to merge 1 commit intoPaddlePaddle:developfrom
lonelygsh:fix-speculate-decoding-index-bugs

lonelygsh commented Apr 2, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Apr 2, 2026

Uh oh!

CLAassistant commented Apr 2, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 2, 2026 •

edited

Loading

Uh oh!

fastdeploy-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

lonelygsh commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

speculate_set_stop_value_multi_seqs

speculate_limit_thinking_content_length

测试

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 2, 2026

Uh oh!

CLAassistant commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

PR 规范检查

问题

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lonelygsh commented Apr 2, 2026 •

edited

Loading

CLAassistant commented Apr 2, 2026 •

edited

Loading

codecov-commenter commented Apr 2, 2026 •

edited

Loading